New Subsampling Algorithms for Fast Least Squares Regression
نویسندگان
چکیده
We address the problem of fast estimation of ordinary least squares (OLS) from large amounts of data (n p). We propose three methods which solve the big data problem by subsampling the covariance matrix using either a single or two stage estimation. All three run in the order of size of input i.e. O(np) and our best method, Uluru, gives an error bound of O( √ p/n) which is independent of the amount of subsampling as long as it is above a threshold. We provide theoretical bounds for our algorithms in the fixed design (with Randomized Hadamard preconditioning) as well as sub-Gaussian random design setting. We also compare the performance of our methods on synthetic and real-world datasets and show that if observations are i.i.d., sub-Gaussian then one can directly subsample without the expensive Randomized Hadamard preconditioning without loss of accuracy.
منابع مشابه
Fast and Robust Least Squares Estimation in Corrupted Linear Models
Subsampling methods have been recently proposed to speed up least squares estimation in large scale settings. However, these algorithms are typically not robust to outliers or corruptions in the observed covariates. The concept of influence that was developed for regression diagnostics can be used to detect such corrupted observations as shown in this paper. This property of influence – for whi...
متن کاملAnalysis of Tidal Data via the Blockwise Bootstrap
We analyze tidal data from Port Mansseld, TX using Kunsch's (1989) blockwise bootstrap in the regression setting. In particular, we estimate the variability of parameter estimates in a harmonic analysis via block subsampling of residuals from a least squares t. We see that naive least squares variance estimates can be either too large or too small depending on the strength of correlation and th...
متن کاملExact and approximate solutions of fuzzy LR linear systems: New algorithms using a least squares model and the ABS approach
We present a methodology for characterization and an approach for computing the solutions of fuzzy linear systems with LR fuzzy variables. As solutions, notions of exact and approximate solutions are considered. We transform the fuzzy linear system into a corresponding linear crisp system and a constrained least squares problem. If the corresponding crisp system is incompatible, then the fuzzy ...
متن کاملNYTRO: When Subsampling Meets Early Stopping
Early stopping is a well known approach to reduce the time complexity for performing training and model selection of large scale learning machines. On the other hand, memory/space (rather than time) complexity is the main constraint in many applications, and randomized subsampling techniques have been proposed to tackle this issue. In this paper we ask whether early stopping and subsampling ide...
متن کاملThe Fast Subsampled - Updating Fast A ne Projection ( FSU FAP ) Algorithm
The Fast A ne Projection (FAP) Algorithm is the fast version of the AP algorithm which is a generalization of the well-known Normalized Least-Mean-Square (NLMS) algorithm. The AP algorithm shows performances that are near to those of the Recursive Least-Squares algorithms while its computational complexity is nearly the same as the LMS algorithm one. Moreover, recent research has enlightened th...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2013